Serveur d'exploration sur la visibilité du Havre

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Information extraction in unconstrained handwritten documents: application to automatic processing of handwritten incoming mail

Identifieur interne : 000567 ( Main/Exploration ); précédent : 000566; suivant : 000568

Information extraction in unconstrained handwritten documents: application to automatic processing of handwritten incoming mail

Auteurs : S. Thomas [France]

Source :

RBID : Hal:tel-00863502

Descripteurs français

English descriptors

Abstract

Despite the avenment of our world into the digital era, a large amount of handwritten documents continue to be exchanged, forcing our companies and administrations to cope with the processing of masses of documents. Automatic processing of these documents requires access to an unknown but relevant part of their content, and implies taking into account three key points: the document segmentation into relevant entities, their recognition and the rejection of irrelevant entities. Contrary to traditional approaches (full documents reading or keyword detection), all processes are parallelized leading to an information extraction approach. The first contribution of the present work is the design of a generic text line model for information extraction purpose and the implementation of a complete system based on Hidden Markov Models (HMM) constrained by this model. In one pass, the recognition module seeks to discriminate relevant information, characterized by a set of alphabetic, numeric or alphanumeric queries, with the irrelevant information, characterized by a filler model. A second contribution concerns the improvement of the local frame discrimination by using a deep neural network. This allows one to infer high-level representation for the frames and thus automate the feature extraction process. These result is a complete, generic and industrially system, responding to emerging needs in the field of handwritten document automatic reading: the extraction of complex information in unconstrained documents.

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Information extraction in unconstrained handwritten documents: application to automatic processing of handwritten incoming mail</title>
<title xml:lang="fr">Extraction d'information dans des documents manuscrits non contraints : application au traitement automatique des courriers entrants manuscrits</title>
<author>
<name sortKey="Thomas, S" sort="Thomas, S" uniqKey="Thomas S" first="S." last="Thomas">S. Thomas</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300317" type="direct">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Le Havre</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université du Havre</orgName>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:tel-00863502</idno>
<idno type="halId">tel-00863502</idno>
<idno type="halUri">https://tel.archives-ouvertes.fr/tel-00863502</idno>
<idno type="url">https://tel.archives-ouvertes.fr/tel-00863502</idno>
<date when="2012-07-12">2012-07-12</date>
<idno type="wicri:Area/Hal/Corpus">000176</idno>
<idno type="wicri:Area/Hal/Curation">000176</idno>
<idno type="wicri:Area/Hal/Checkpoint">000294</idno>
<idno type="wicri:Area/Main/Merge">000571</idno>
<idno type="wicri:Area/Main/Curation">000567</idno>
<idno type="wicri:Area/Main/Exploration">000567</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Information extraction in unconstrained handwritten documents: application to automatic processing of handwritten incoming mail</title>
<title xml:lang="fr">Extraction d'information dans des documents manuscrits non contraints : application au traitement automatique des courriers entrants manuscrits</title>
<author>
<name sortKey="Thomas, S" sort="Thomas, S" uniqKey="Thomas S" first="S." last="Thomas">S. Thomas</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300317" type="direct">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Le Havre</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université du Havre</orgName>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="mix" xml:lang="en">
<term>HMM</term>
<term>Offline Handwriting Recognition</term>
<term>Out-Of-Vocabulary Model</term>
<term>deep neural network</term>
<term>keyword spotting</term>
<term>neuro markovian model</term>
</keywords>
<keywords scheme="mix" xml:lang="fr">
<term>Reconnaissance de l'écriture</term>
<term>architectures profondes</term>
<term>modèle hybride</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Despite the avenment of our world into the digital era, a large amount of handwritten documents continue to be exchanged, forcing our companies and administrations to cope with the processing of masses of documents. Automatic processing of these documents requires access to an unknown but relevant part of their content, and implies taking into account three key points: the document segmentation into relevant entities, their recognition and the rejection of irrelevant entities. Contrary to traditional approaches (full documents reading or keyword detection), all processes are parallelized leading to an information extraction approach. The first contribution of the present work is the design of a generic text line model for information extraction purpose and the implementation of a complete system based on Hidden Markov Models (HMM) constrained by this model. In one pass, the recognition module seeks to discriminate relevant information, characterized by a set of alphabetic, numeric or alphanumeric queries, with the irrelevant information, characterized by a filler model. A second contribution concerns the improvement of the local frame discrimination by using a deep neural network. This allows one to infer high-level representation for the frames and thus automate the feature extraction process. These result is a complete, generic and industrially system, responding to emerging needs in the field of handwritten document automatic reading: the extraction of complex information in unconstrained documents.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Haute-Normandie</li>
<li>Région Normandie</li>
</region>
<settlement>
<li>Le Havre</li>
<li>Rouen</li>
</settlement>
<orgName>
<li>Université de Rouen</li>
<li>Université du Havre</li>
</orgName>
</list>
<tree>
<country name="France">
<region name="Région Normandie">
<name sortKey="Thomas, S" sort="Thomas, S" uniqKey="Thomas S" first="S." last="Thomas">S. Thomas</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/France/explor/LeHavreV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000567 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000567 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/France
   |area=    LeHavreV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Hal:tel-00863502
   |texte=   Information extraction in unconstrained handwritten documents: application to automatic processing of handwritten incoming mail
}}

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Sat Dec 3 14:37:02 2016. Site generation: Tue Mar 5 08:25:07 2024